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Abstract 

Theory of Kingman's partition structures has two culminating points 

• the general paintbox representation, relating finite partitions to hypothetical 
infinite populations via a natural sampling procedure, 

• a central example of the theory: the Ewens-Pitman two-parameter partitions. 
In these notes we further develop the theory by 

• passing to structures enriched by the order on the collection of categories, 

• extending the class of tractable models by exploring the idea of regeneration, 

• analysing regenerative properties of the Ewens-Pitman partitions, 

• studying asymptotic features of the regenerative compositions. 

1 Preface 

The kind of discrete regenerative phenomenon discussed here is present in the cycle pat- 
terns of random permutations. To describe this instance, first recall that every permu- 
tation of [n] := {!,..., n} is decomposable in a product of disjoint cycles. The cycle 
sizes make up a partition of n into some number of positive integer parts. For instance, 
permutation (1 3) (2) of the set [3] corresponds to the partition of integer 3 with parts 

2 and 1. Permutations of different degrees n are connected in a natural way. Starting 
with a permutation of [n], a permutation of the smaller set [n — 1] is created by removing 
element n from its cycle. This reduction is a surjective n-to-1 mapping. For instance, 
three permutations (1 3)(2), (1)(2 3), (1)(2)(3) are mapped to (1)(2). 

Now suppose the permutation is chosen uniformly at random from the set of all n! 
permutations of [n] . The collection of cycle-sizes is then a certain random partition vr^ of 
integer n. By the n-to-1 property of the projection, the permutation reduced by element 
n is the uniformly distributed permutation of [n — 1], with the cycle partition Hn-i- The 
transition from 7r„ to 7r„_i is easy to describe directly, without reference to underlying 
permutations: choose a random part of vr^ by a size-biased pick, i.e. with probability 
proportional to the size of the part, and then reduce the chosen part by 1. This transition 
rule suggests to view the random partitions with varying n altogether as components of 
an infinite partition structure {Tin, n = 1, 2, . . .). 
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Apart from the consistency property inherent to any partition structure, there is an- 
other recursive self-reproduction property of the partitions derived from the cycle patterns 
of uniform permutations. Fix n and suppose a part is chosen by a size-biased pick from 
TTn and completely deleted. Given the part was m, the partition reduced by this part will 
be a distributional copy of TTn-m- In this sense the partition structure (tt^, n — 1,2,...) 
regenerates. 

For large n, the size-biased pick will choose a part with about nil elements, where U 
is a random variable with uniform distribution on the unit interval. In the same way, the 
iterated deletion of parts by size-biased picking becomes similar to the splitting of [0, 1] 
at points representable via products of independent uniform variables. The latter is a 
special case of the multiplicative renewal process often called stick-breaking. 

In these notes we consider sequences of partitions and ordered partitions which are 
consistent in the same sense as the cycle patterns of permutations for various n. In 
contrast to that, the assumption about the regeneration property of such structures will 
be fairly general. The connection between combinatorial partitions and splittings of the 
unit interval is central in the theory and will be analysed in detail in the general context 
of regenerative structures. 



2 The paintbox and the two-parameter family 

A composition of integer n is an ordered sequence A° = (Ai, . . . , A^) of positive integer 
parts with sum |A°| := \j = n. We shall think of composition as a model of occupancy, 
meaning n 'balls' separated by 'walls' into some number of nonempty 'boxes', like in this 
diagram 

I • • • I • I • • I 

representing composition (3,1,2). A wall | is either placed between two consequitive w's 
or not, hence there are 2"^^ compositions of n. Sometimes we shall also use encoding 
the compositions into binary sequences, in which a 1 followed by some m — 1 zeroes 
corresponds to part m, like the code 100110 for composition (3, 1, 2), 

A related labeled object is an ordered partition of the set [n] := {1, . . . , n}, which may 
be obtained by some enumeration of the balls by integers 1, . . . , n, like 

I 2 4 5 I 3 I 1 6 I 
• ••••• 

(the ordering of balls within a box is not important). The number of such labelings, that 
is the number of ordered set partitions with shape (Ai, . . . , Ajt), is equal to the multinomial 
coefficient 

/°(Ai, . . . , Afc) — — . 

Ai! • • • Afe! 

ThroughoTit. symbol ° will denote a function of composition, also when the function is 
not sensitive to the permutation of parts. 

Discarding the order of parts in a composition (Ai, . . . , Afc) yields a partition of integer 
|A|, usually written as a ranked sequence of nondecreasing parts. For instance, the ranking 
maps compositions (3,1,2) and (1,3,2) to the same partition (3,2, 1)^^, where [ will be 
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both used to denote the operation of ranking and to indicate that the arrangement of 
parts in sequence is immateriaL Sometimes we use notation hke 2 G (4,2,2, 1)^^ to say 
that 2 is a part of partition. The number of partitions of the set [n] with the same shape 
A-*^ = (Ai, . . . , \k)^ is equal to 

n 

where kr = #{j : A^ = r} is the number of parts of A-'- of size r. 

A random composition/partition of n is simply a random variable with values in the 
finite set of compositions/partitions of n. One statistical context where these combina- 
torial objects appear is the species sampling problem. Imagine an alien who has no idea 
of the mammals. Suppose the first six mammals she observes are tiger, giraffe, elephant, 
elephant, elephant and giraffe, appearing in this sequence. Most frequent - three of these 
- have long trunks, two are distinctively taller than the others, and one is striped. She 
records this as partition (3, 2, l)-'- into three distinct species. Composition (1, 3, 2) could 
appear as the record of species abundance by more delicate classification according to 
typical height, from the lowest to the tallesl|j. Enumerating the animals in the order of 
observation gives a labeled object, a partition/ordered-partition of the set [6] = {!,..., 6}. 

There are many ways to introduce random partitions or compositions. The method 
adopted here is intrinsically related to the species sampling problem. This is the following 
ordered version of Kingman's paintbox (see [7], [H], [S]). 

Ordered paintbox Let 7^ be a random closed subset of [0,1]. The complement open 
set TZ'^ := (0, 1)\TZ has a canonical representation as a disjoint union of countably many 
open interval components, which we shall call the gaps of TZ. Independently of TZ, sample 
points Ui,U2, ■ ■ ■ from the uniform distribution on [0, 1] and group the points in clusters 
by the rule: Ui, Uj belong to the same cluster if they hit the same gap of 71. If Ui falls in 
TZ let Ui be a singleton. For each n, count the representatives of clusters among Ui, . . . ,Un 
and define x„, a random composition of integer n, to be the record of positive counts in 
the left-to-right order of the gaps. 

For instance, x„ assumes the value (3, 1, 2) if, in the left-to-right order, there is a gap hit 
by three points out of Ui, . . . ,Uq, a singleton cluster resulting from either some gap or 
from some Uj eTZ, and a gap hit by two of Ui, . . . ,Uq. 

In the proper case TZ has Lebesgue measure zero almost surely, hence Uj G TZ occurs only 
with probability zero. We may think then of points of TZ as possible locations of walls | 
and of the points of [0, 1] as possible locations of balls •. In a particular realisation, the 
balls appear at locations Uj, and the walls bound the gaps hit by at least one ball. In the 
improper case, TZ may have positive measure with nonzero probability. If Uj G 7^ we can 
imagine a box with walls coming so close together that no further ball will fit in this box, 
so Uj will forever remain a singleton, no matter how many balls are added. 



^If her guidebook would describe four species, e.g. these three and the cows, her records would be 
(3, 2, 1, O)-'-, (1, 0, 3, 2) (weak partitions, respectively, weak compositions), but we assumed that she knew 
apriori really nothing of the mammals. 
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Sometimes we shall identify TZ with the splitting of [0, 1] it induces, and just call TZ 
itself the paintbox. Molchanov [38] gives extensive exposition of the theory of random 
sets, although an intuitive idea will suffice from most of our purposes. This can be a set 
of some fixed cardinality, e.g. splittings of [0, 1] following a Dirichlet distribution (see 
|15] . [22]), or complicated random Cantor-type sets like the set of zeroes of the Brownian 
motion. It will be also convenient to make no difference between two closed subsets of 
[0, 1] if they only differ by endpoints or 1. If 1 or is not accumulation point for TZ, the 
gap adjacent to the boundary will be called right or left meander. 

The paintbox with random 7^ is a kind of canonical representation of 'nonparametric 
priors' in the species sampling problem. View M as an ordered space of distinct types. 
Originally, by Kingman [36], the types were colours making up a paintbox. Consider a 
random probability measure F on reals as a model of infinite ordered population. Let 
^1,^2, ... be a sample from F, which means that conditionally given F, the ^/s are i.i.d. 
with distribution F . An ordered partition of the sample is defined by grouping j's with 
the same value of ^j, with the order on the groups maintained by increase of the values. 
The case of diffuse (nonatomic) F is trivial - then ties among ^j's have probability zero 
and the partition has only singletons, so the substantial case is F with atoms, when 
the partition will have nontrivial blocks. The same ordered partition is induced by any 
other distribution obtained from F by a suitable monotonic transformation, which may be 
random. To achieve the uniqueness, view F as a random distribution function and observe 
that < iff F(^j) < F{C,j). Conditioning on F and applying the quantile transform y 
F{y) to the sample produces another sample ^i,^, . . . from the transformed distribution 
F supported by [0, 1]. In the diffuse case, F is well known to be the uniform distribution, 
and in general the distribution function F is of special kind: it satisfies F{x) < x for 
X G [0, 1] and F(x) = x F-a.s. Moreover, each jump location of F is preceded by a 
fiat (where F is constant), whose length is equal to the size of the jump. The latter 
implies that the composition derived from F by grouping equal ^^'s in clusters is the 
same as the composition obtained via the paintbox construction from TZ = support (F). 
The identification with the paintbox construction can be shown more directly, i.e. without 
appealing to F, by taking for TZ the range of the random function F (note that support (F) 
with attached to it coincides with the range of F). 

Note further important features inherent to the paintbox construction: 

• The unlabeled object, x„, is determined by TZ and the uniform order statistics 
Un:i < . . . < Un:n, i-G. the rauks of Ui, ... ,Un appear as random labels and do not 
matter. 

• Attaching label j to the ball corresponding to Uj, we obtain, for each n, an ordered 
partition K„ of the set [n], with shape x„. This ordered partition is exchangeable, 
meaning that a permutation of the labels does not change the distribution of K„, 
thus all ordered partitions of [n] with the same shape have the same probability. 

• The ordered partitions K„ are consistent as n varies. Removing ball n (and deleting 
an empty box in case one is created) reduces K„ to K„_i. The infinite sequence K = 
(K„) of consistent ordered partitions of [1], [2], . . . defines therefore an exchangeable 
ordered partition of the infinite set N into some collection of nonempty blocks. 
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Translating the consistency in terms of compositions x„ we arrive at 

Definition 2.1. A sequence k = of random compositions of n = 1, 2, ... is called 
a composition structure if these are sampling consistent: for each n > 1, conditionally 
given x„ = (Ai, . . . , A^) the composition Xn-i has the same distribution as the composition 
obtained by reducing by 1 each part Xj with probability Xj/n. 

A size-biased part of composition A° is a random part which coincides with every part 
Xj with probability Aj/|A°|. A size-biased part of a random composition x„ is defined 
conditionally on the value x„ = A°. The sampling consistency condition amounts to the 
transition from x„ to by reducing a size-biased part. This special reduction rule in 
Definition 12.11 is a trace of the exchangeability in K„ that remains when the labels are 
erased: indeed, given the sizes of the blocks, the ball with label n belongs to a particular 
block of size Xj with probability Xj/n. 

Keep in mind that the consistency of ordered set partitions K„ is understood in the 
strong sense, as a property of random objects defined on the same probability space, 
while Definition 12.11 only requires weak consistency in terms of the distributions of x„'s. 
By the measure extension theorem, however, the correspondence between (the laws of) 
exchangeable ordered partitions of N and composition structures is one-to-one, and any 
composition structure can be realised through an exchangeable ordered partition of N. In 
view of this correspondence, dealing with labeled or unlabeled objects is just the matter 
of convenience, and we shall freely switch from one model to another. 

A central result about the general composition structures says that these can be 
uniquely represented by a paintbox [14j. This extends Kingman's [36] representation 
of partition structures. 

Theorem 2.2. For every composition structure x = (x„) there exists a unique distri- 
bution for a random closed set 71 which by means of the paintbox construction yields, for 
each n, a distributional copy of Xn- 

Sketch of proof The line of the proof is analogous to modern proofs of de Finetti's theorem 
which asserts that a sequence of exchangeable random variables is conditionally i.i.d. given 
the limiting empirical distribution of the sequence (see Aldous [T]). To this end, we need 
to make the concept of a random closed set precise. One way to do this is to topologise 
the space of closed subsets of [0, 1] by means of the Hausdorff distance. Recall that for 
Ri,R2 C [0, 1] (with boundary points 0, 1 adjoined to the sets) the distance is equal to 
the smallest e such that the e-inflation of Ri covers R2 and the same holds with the roles 
swapped, so the distance is small when the sizes and positions of a few biggest gaps are 
approximately the same for both sets. Realise all x„'s on the same probability space 
through some exchangeable K. Encode each composition (Ai,...,Afc) into a finite set 
{0, Ai/n, . . . , Ak-i/n, 1} where Aj = Ai + . . . + Xj. This maps x^ to a finite random set 
TZn C [0,1]. By a martingale argument it is shown that the law of the large numbers 
holds: a.s n 00 the sets converge almost surely to a random closed set TZ. The limit 
TZ is shown to direct the paintbox representation of x. □ 

There are various equivalent formulations of the result in terms of (i) the exchange- 
able quasi-orders on N (in the spirit of ^33]), (ii) the entrance Martin boundary for the 
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time-reversed Markov chain {>in,n = ...,2,1), (iii) certain functional on the infinite- 
dimensional algebra of quasisymmetric functions [23] . 

We define the composition probability function (CPF for shorthand) p° of a composition 
structure x as 

p°(A°) :=PK = A°), |A°|=n, n=l,2,... 

For fixed |A°| = n this is the distribution of x„. To avoid confusion with the distribution of 
Kn we stress that the probability of any particular value of the set partition K„ with shape 
A° is equal to p°(A°)//°(A°). Sampling consistency translates as a backward recursion 

p°(A°) = 5^c(A°,/.>°(/i°), (1) 

where /i° runs over all shapes of extensions of any fixed ordered partition of [n] with shape 
A° to some ordered partition of [n + 1]. For instance, taking A° = (2,3), /x° assumes the 
values (1, 2, 3), (2, 1, 3), (2, 3, 1), (3, 3), (2, 4). The coefficient c(A°, /i°) is the probability to 
obtain A° from fi° by reducing a size- biased part of 

For fixed n, if p°{X°) is known for compositions A° with |A°| = n, then solving ([1]) 
backwards gives the values of CPF for all compositions with |A°| < n. By linearity of 
the recursion, every such partial solution, with n' < n, is a convex combination of 2^~^ 
solutions obtained by taking delta measures on the level n. Similarly, without restricting 
n, the set of CPF's is convex and compact in the weak topology of functions on a countable 
set; this convex set has the property of uniqueneess of bary centric decomposition in terms 
of extreme elements (Choquet simplex). The extreme CPF's are precisely those derived 
from nonrandom paintboxes. The correspondence between extreme solutions and closed 
subsets of [0, 1] is a homeomorphism, which extends to the homemorphism between all 
CPF's and distributions for random closed TZ. 

Discarding the order of parts in each x„ we obtain Kingman's partition structure vr = 
(7r„) with TTn = Partition structures satisfy the same sampling consistency condition as 
in Definition 1 2. 1[ The corresponding labeled object is an exchangeable partition 11 = (n„ ) 
of the infinite set N. The law of large numbers for partition structures says that, as 
77, — > oo, the vector n~^7r„ padded by infinitely many zeroes converges (weakly for 7r„, 
strongly for n^i) to a random element S of the infinite-dimensional simplex 

V = {(si) : si > S2... >0, < 1}, 

i 

SO the components of S are the asymptotic frequencies of the ranked parts of x„. The 
partition probability function (PPF) 

p{\^) ■= p(7r„ = A^), |A^| = n, n = 1, 2, . . . , 

specifies distributions of 7r„'s and satisfies a recurrence analogous to ([T]). The correspon- 
dence between PPF's and distributions for unordered paintbox S is bijective. Note that 
the possibility of strict inequality Sj < 1 occurs in the improper case, where the diffuse 
mass 1 — ^jSj, sometimes also called dust [7], is equal to the cumulative frequency of 
singleton blocks of 11 given S = (sj). 
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Discarding order is a relatively easy operation. In terms of ordered and unordered 
paintboxes TZ and S the connection is expressed by the formula 



where the ranking | means that the gap-sizes of TZ are recorded in nonincreasing order. 
The operation | is a continuous mapping from the space of closed subsets of [0, 1] to V. 
In terms of distributions, passing from CPF to PPF is expresses by the symmetrisation 
formula 



where A'^ runs over all distinct arrangements of parts of A-'^ in a composition (e.g. for 
partition (2, 1, l)-'- there are three such compositions (2, 1, 1), (1, 2, 1), (1, 1, 2)). 

In the other direction, there is one universal way to introduce the order. With every 
partition structure one can accosiate a unique symmetric composition structure, for which 
any of the following three equivalent conditions holds: 

(i) all terms in the RHS of ([3]) are equal, 

(ii) conditionally given n„ with k blocks, any arrangement of the blocks in K„ has the 
same probability, 

(iii) the gaps of TZ appear in the exchangeable random order. 

The last property (iii) means that, conditionally given S = (sj) with Sk > 0, every relative 
order of the first k largest gaps (labeled by [k]) of sizes si,...,Sk has probability l/k\. 
This rule defines 7Z unambiguously in the proper case, and extension to the improper case 
follows by continuity. A simple example of symmetric TZ is associated with splitting [0, 1] 
according to the symmetric Dirichlet distribution on a finite-dimensional simplex. 

Beside from the symmetric composition structure, there are many other composition 
structures associated with a given partition structure. Understanding the connection in 
the direction from unordered to ordered structures is a difficult problem of arrangement. 
To outline some facets of the problem, suppose we have a rule to compute p° from p, how 
can we pass then from S to TZ7 Specifically, given S = (sj), in which order the intervals 
of sizes Si, S2, . . . should be arranged in an open set? Other way round, suppose we have 
a formula for p and know that ([2]) is true, how then can we compute the probability that 
given TTs = (3, 2)^^ the parts appear in the composition as (2, 3)? Most questions like that 
cannot have universal answers, because random sets and random series are objects of high 
complexity, and the paintbox correspondence cannot be expressed by simple formulas. 

Ewens-Pitman partition structures In the theory of partition structures and partition- 
valued processes of fragmentation and coagulation [7] a major role is played by the Ewens- 
Pitman two-parameter family of partitions, with PPF 




(2) 




(3) 
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where and henceforth {z)n := z{z + 1) ■ ■ ■ {z + n — 1) is a rising factorial. The principal 
range of the parameters is 

{(a, ^) : < a < 1, ^ > -a} U {{a,e) : a < 0, -9/a E N}, (5) 

and there are also a few degenerate boundary cases defined by continuity. 

One of many remarkable features of these partitions is the sequential device for gen- 
erating the corresponding exchangeable partition 11 = (n„). Start with the one-element 
partition Hi. Inductively, Suppose n„ has been constructed then, given that the shape 
of Un is (Ai, . . . , Afc)^, the ball n + 1 is placed in the existing box i with probability 
{Xi — a)/{n + 9) for i = 1, . . . ,k, and starts a new box with probability {9 + ka)/{n + 9). In 
the Dub ins-Pitman interpretation as a 'Chinese restaurant process', the balls correspond 
to customers arriving in the restaurant, and boxes are circular tables. With account of 
the circular ordering of customers at each occupied table, and subject to uniform random 
placement at each particular table, the process also defines a consistent sequence of ran- 
dom permutations for n = 1, 2, . . .; with uniform distributions in the case {a, 9) = (0, 1). 

The two-parameter family has numerous connections to basic types of random pro- 
cesses like the Poisson process and the Brownian motion, see Pitman's lecture notes [13] 
for a summary. It also provides an exciting framework for the problem of arrangement. 

3 Regenerative composition structures 

Every x„ in a composition structure may be regarded as a reduced copy of Xn+i- We com- 
plement this now by another type of self-reproduction property, related to the reduction 
by a whole box. 

Definition 3.1. A composition structure x = is called regenerative if for all 
n > m > 1, the following deletion property holds. If the first part of x„ is deleted and 
conditionally given this part is m, the remaining composition of n — m is distributed like 

^n—m- 

Denote the first part of x„ and consider its distribution 

q{n : m) := p°{\°). 

I A° |=n, Ai=m 

It follows immediately from the definition that x is regenerative iff the CPF has the 
product form 

k 

p°(Ai,...,A,) = n?(A,: A,), (6) 
i=i 

where Aj = Xj + . . . + A^ for I < j < k. 

For each n, the formula identifies x„ with the sequence of decrements of a decreasing 
Markov chain = t = 0, 1, . . .) on 0, . . . , n. The chain starts at n, terminates 

at 0, and jumps from n' < n to n' — m with probability q{n' : m). The binary code of 
x„ is obtained by writing I's in positions n — Q^it) + 1, t = 0, 1, . . ., and writing 0' is all 
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other positions, with the convention that the last 1 in position n + 1 is not included in 
the code. In view of this interpretation, we call q = {q{n : m),l < m < n, ri G N) the 
decrement matrix of x. Since p° is computable from q, the decrement matrix determines 
completely the distributions of x^'s and the distribution of the associated exchangeable 
ordered partition K. 

For a given regenerative x let vr = (vr^), with 7r„ = x^^, be the related partition 
structure. Think of x„ as an arrangement of parts of 7r„ in some order. For partition A-'^ 
of n and each m G A-'- define the deletion kernel 

rf(Ai, m) = P(F„ = m 1 7r„ = A^), 

which specifies the conditional probability, given the unordered multiset of parts, to place 
a part of size m in the first position in x„ (so d{X^,m) = if m ^ A). The deletion 
property of x implies that the PPF of vr satisfies the identity 

p{X^)d{X^,m) = q{n : m)p(X^ \ {m}), (7) 

where q{n : ■), the distribution of -F„, may be written in terms of the deletion kernel as 

q{n : m) = d{X^, m)p(A^). (8) 

{Ai: |Ai|=n, m6Ai} 

Intuitively, the deletion kernel is a stochastic algorithm of choosing a part of partition 
7r„ to place it in the first position of composition x„. Iterated choices arrange all parts 
of each 7r„ in x„, hence the deletion kernel may be used to describe the arrangement 
on the level of finite partitions. The partition structure tt inherits from x the property 
of invariance under deletion of a part chosen by some random rule, expressed formally 
as ([7]) and ([8]). This is, of course, a subtle property when compared with more obvious 
invariance of x under the first-part deletion, as specified in Definition 13.11 

3.1 Compositions derived from stick-breaking 

Exploiting the paintbox construction we shall give a large family of examples of regener- 
ative composition structures. The method is called stick-breaking, and it is also known 
under many other names like e.g. residual allocation model or, deeper in history, random 
alms pi] . 

Let (Wi) be independent copies of some random variable W with range < W < 1. 
A value of W is chosen, and the unit stick [0, 1] is broken at location W in two pieces, 
then the left piece of size W is frozen, and the right piece of size 1 — is broken again 
in proportions determined by another copy of W, and so on ad infinitum. The locations 
of breaks make up a random set TZ with points 

k 

Y, = l-l[{l-W,), k = l,2,..., (9) 

i=l 

so the gaps are 7^'= = U^o(Yfc, Yfc+i). The cardinality of 7^ is finite if ¥{W = 1) > 0, 
but otherwise infinite, with points accumulating only at the right endpoint of the unit 
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interval. By the i.i.d. property of the proportions, the part of TZ to the right of Yi is a 
scaled copy of the whole set, 



i n, (10) 

and this re-scaled part of TZ is independent of Yi. 

Suppose a composition structure x is derived from the paintbox TZ = {Yj, j = 
0, 1, . . .}. If (0, Yi) contains at least one of the first n uniform points Uj, then the first 
part of the composition x„ is equal to the number of uniforms hitting this interval. Oth- 
erwise, conditionally given Yi, the sample comes from the uniform distribution on [Yi, 1]. 
Together with the property ffTU]) of 7Z this implies 

g(n : m) = (^^^E - W^)"""^) +E(1 - : m), 

whence the law of the first part of x„ is 

gn:m =^^^^Vt 7 tttt^ m = l,...,n. 11 

^ E (1 - (1 - VT)'^) ' ' ^ ^ 

which is a mixture of binomial distributions conditioned on a positive value. The key 
property (fTOl) we exploited can be generalised for every Yk G TZ, from which iterating the 
argument we obtain the product formula (EI). 

Concrete examples are obtained by choosing a distribution for W. For instance, taking 
delta measure 6^ with some x G (0, 1) yields 7^ = {1 — (1 — x)^, k = 0, . . . , oo}, which 
induces the same composition structure as the one associated with sampling from the 
geometric distribution on the set of integers. This composition structure was studied in 
many contexts, inluding theory of records and random search algorithms. 

Expectations involved in ( ITTl) may be computed explicitly only in some cases, e.g. for 
W with polynomial density, but even then the product formula ([6]) rarely simplifies. 

Example Here is an example of a relatively simple decrement matrix. Taking W with 
the general two-parameter beta density 

Kdx) = '-^^ , (7,^>0) (12) 

we arrive at 

The product formula ([6]) simplifies moderately for general integer 7 [16], and massively 
in the following case 7 = 1. 

Regenerative composition structures associated with Ewens' partitions Now 

suppose W has a beta(l, 9) density 

u{dx) = 9{1 - xf-^dx, xG(0,1). 
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Evaluating beta integrals in fill I) we find the decrement matrix 



/ N {9)n-mm\ 
q{n : m) = { ]— — , (14) 



and massive cancellation in gives the CPF 

PM(Ai,...,A.) = ^n-^, (15) 

with Aj = Xj + . . . + Afc. Symmetrisation ([3]) gives the PPF known as the Ewens sampling 
formula (ESF) 

Po,.(A^) = /(A^)7^, (16) 

which is a special case of (j4]). Recall that the combinatorial factor is the number of 
set partitions of [n] with given shape. The range of parameter is 6' G [0, cxd], with the 
boundary cases defined by continuity. 

For 6 = 1, the distribution of W is uniform[0, 1] and q{n : m) = is a discrete uni- 
form distribution for each n; the associated partition 7r„ is the same as the cycle partition 
of a uniform random permutation of [n]. For general 6, the ESF corresponds to a biased 
permutation, which for each n takes a particular value with probability 9'^'^^'^^'^^ / {9)n- 

We shall call x with CPF (fT5|) Ewens' regenerative composition structure. The prob- 
lem of arrangement has in this case a simple explicit solution. For partition (Ai, . . . , XkY 
the size-biased permutation is the random arrangement of parts obtained by the iterated 
size-biased picking without replacement. For nonnegative (sj) G V with J2j = 1 de- 
fine a size-biased permutation in a similar way: a generic term sj is placed in position 
1 with probability proportional to sj, then another term is chosen by a size-biased pick 
from the remaining terms and placed in position 2, etc. The resulting random sequence 
is then in the size-biased order, hence the distribution of the sequence is invariant under 
the size-biased permutation [§. 

Theorem 3.2. Ewens' composition structure f[T^ has parts in the size-biased order, 
for every n. Conversely, if a regenerative composition structure has parts in the size-biased 
order, then its CPF is (fT5|) for some 9 G [0, oo]. 

The paintbox also has a similar property: the intervals (Y^-,l^+i) are in the size-biased 
order. The law of frequencies S is known as Poisson-Dirichlet distribution. The law of 
the gap-sizes (Yi — Yq, Y2 — Yi, . . .) is called the GEM distribution. 

Remark For set partitions, size-biased ordering is sometimes understood as the arranging 
of blocks of Tin by increase of their minimal elements (other often used names: age 
ordering, sampling ordering). This creates an ordered partition for each n, but this ordered 
partion is not exchangeable, since e.g. element 1 is always in the first block. In Ewens' 



^To define a size-biased arrangement in the improper case Sj < 1 consider any closed set R C [0, 1] 
with gap-sizes (sj ) . Sample uniformly balls Uj and record the gap-sizes by increase of the minimal labels 
of balls, with understanding the points of R as zero-size gaps. 
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case, but not in general, the unlabeled compositions associated with the arrangement by 
increase of the minimal elements of blocks are sampling consistent as in Definition 12.11 
(this observation is due to Donnelly and Joyce [I3]). The last assertion is just another 
formulation of Theorem 13.21 

3.2 Composition structure derived from the zero set of BM 

Consider the process {Bt,t > 0) of Brownian motion (BM), and let ^ = {t : i?^ = 0} be 
the zero set of BM. The complement M \ 2^ is the union of the excursion intervals, where 
the BM is away from zero. Define TZ as Z restricted to [0, 1]. There is a meander gap 
between the last zero of BM and 1, caused by an incomplete excursion abrupted at t = 1, 
but to the left of the meander the set TZ is of the Cantor-type, without isolated points. 
Thus the gaps cannot be simply enumerated from left to right, as in the stick-breaking 
case. Since the BM is a recurrent process with the strong Markov property, the set of 
zeroes to the right of the generic excursion interval is a shifted distributional copy of the 
whole independent of the part of Z to the left of (and including) the excursion interval. 
This implies that 2 is a regenerative set, a property familiar from the elementary renewal 
theory. The scaling property of the BM, {c'^^"^ Bd) = (Bt), implies the self-similarity, 

cZ = Z for c > 0, i.e. the invariance of Z under homotheties. 

Following Pitman [41j, consider the composition structure x derived from TZ = Z (1 
[0,1]. To check the deletion property in Definition 12.11 it is convenient to modify the 
paintbox model in a way accounting for the self-similarity. 

Modified sampling scheme Let Z he a. random self-similar subset of M. Fix n and let 
Xi < X2 < . . . be the points of a unit Poisson process, independent of Z. The interval 
[0, X„+i] is split in components at points of Z, so we can define a composition x„ of n by 
grouping Xi, . . . , Xn in clusters within [0, As n varies, these compositions comprise 

the same composition structure, as the one induced by the standard paintbox construction 
with Zn[0, 1], because (i) the vector {Xi/Xn+i, ■ ■ ■ , Xn/Xn+i) is distributed like the vector 
of n uniform order statistics {Un-.i, • • • , Un-.n) and (ii) by self-similarity, Z/Xn+i = Z. 

Note that, because the locations of 'balls' vary with n, the model secures a weak 
consistency of x„'s, but does not produce strongly consistent ordered set partitions K„. 
Applying the modified scheme in the BM case, the deletion property is obvious from 
the regeneration of Z and of the homogeneous Poisson process, these combined with the 
self- similarity of Z. 

3.3 Regenerative sets and subordinators 

In the stick-breaking case the regeneration property of the induced composition structure 
X followed from the observation that TZ remains in a sense the same when its left meander 
is truncated. This could not be applied in the BM case, since the leftmost gap does not 
exist. By a closer look it is seen that a weaker property of TZ would suffice. For a given 
closed TZ C [0, 1] define the 'droite' point := min{7^ fl [x, 1]}, x G [0, 1], which is the 
right endpoint of the gap covering x (or x itself in the event x eTZ). 
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Definition 3.3. A random closed set TZ G [0, 1] is called multiplicatively regenerative 
(m-regenerative for short) if is independent of (1 — Zx)~^{(7lr{ [Z^, 1]) — Z^) and, given 
Zx < 1, the distributional identity is fulfilled 

{nn[z,,i])-z, ^ 
i-z^ 

for every x G [0, 1). 

Remark We do not require explicitly the independence of [0, Z^] fl 71 and (1 — Zx)^^7l, 
which would correspond to the conventional regeneration property in the additive theory. 
In fact, this apparently stronger property follows from the weaker independence property 
due to connection to composition structures. See [21] for details and connection to the 
bulk-deletion properties of composition structures. 

For m-regenerative paintbox TZ the deletion property of x follows by considering the 
gap that covers Un-.i = min(f/i, . . . , ?7„). Then q{n : ■) is the distribution of the rank of 
the largest order statistic in this gap. 

To relate Definition 13.31 with the familiar (additive) concept of regenerative set, recall 
that a suhordinator {St,t > 0) is an increasing right-continuous process with Sq = and 
stationary independent increments (Levy process). The fundamental characteristics of 
subordinator are the Levy measure u on (0, oo], which controls the intensity and sizes 
of jumps, and the drift coefficient d > responsible for a linear drift component. The 
distribution is determined by means of the Laplace transform 

E[exp{-pSt)] = exp[-t$(p)], p > 0, 

where the Laplace exponent is given by the Levy-Khintchine formula 

$(p)=pd+ / {l-e-f^Mdy). (17) 

J{0,oo] 

The Levy measure must satisfy the condition $(1) < oo which implies z/[?/, oo] < oo and 
also restricts the mass near 0, to avoid immediate passage of the subordinator to oo. A 
positive mass at oo is allowed, in which case (5*^) (in this case sometimes called killed 
subordinator) jumps to oo at some exponential time with rate i/{oo}. Two standard 
examples of subordinators are 

1. Stable subordinators with parameter < a < 1, characterised by 

cry 

Hdy) = :^. 7^"-%, d = 0, $(p) = cp". 

r(l — a) 

2. Gamma subordinators with parameter 9 > 0, characterised by 

Hdy) = cy-'e-'^dy, d = 0, ^p) = clog(l + p/O). 
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The constant c > can be always eliminated by a linear time-change. 

Let 7^ = {St,t > 0}^' be the closed range of a subordinator. By properties of the 
increments, 71 is regenerative: for Zy the 'droite' point at y > 0, conditionally given 
Zy < oo, the random set (JZ — Zy) n [0, oo] is distributed like 71 and is independent of 
[0, Zy] n 7Z and Zy. Also the converse is true: by a result of Maisonneuve [SB] every 
regenerative set is the closed range of some subordinator, with (z>, d) determined uniquely 
up to a positive multiple. 

Call the increasing process (1 — exp(— S't),t > 0) multiplicative subordinator, and let 
7Z = 1 — exp{—7Z) be its range. The regeneration property of 7Z readily implies that 
7Z is m-regenerative. As time passes, the multiplicative subordinator proceeds from 
to 1, thus it is natural to adjust the Levy measure to the multiplicative framework by 
transforming 9, by the virtue of y 1 — e~^, in some measure u on (0, 1], which accounts 
now for a kind of continuous-time stick-breaking. We shall still call u the Levy measure 
where there is no ambiguity. In these terms the Levy-Khintchine formula becomes 

$(p)=pd+ [ {l-{l-xy}u{dx). (18) 
Jo 

For integer 1 < m < n introduce also the binomial moments of u 

$(n : m) = a;™(l - x)""" u{ dx) + l(m = 1) nd 

(where 1(- ■ ■ ) stands for indicator), so that $(n) = ^^^=1 ^('^ • "^)- According to one in- 
terpretation of f|T7|) . $(p) is the probability rate at which the subordinator passes through 
independent exponential level with mean 1/ p. Similarly, $(n) is the rate at which the 
multiplicative subordinator passes through Un-i and $(n : m) is the rate to jump from 
below Un:i to a value between Un-.m and Un-.m+i- From this, the probability that the first 
passage through Un;i covers m out of n uniform points is equal to 

$(n : m) 

Qin : m = ^r^, 19 

which is the general representation for decrement matrix of a regenerative composition 
structure associated with m-regenerative set. The proper case corresponds to the zero 
drift, d = 0, then passage through a level can only occur by a jump. 

In the case of finite 9 and d = the subordinator is a compound Poisson process 
with no drift. Scaling i/ to a probability measure, the range of (1 — exp{—St),t > 0) 
is a stick-breaking set with the generic factor W distributed according to u ; then ( fT9l) 
becomes f lTTl) . 

The connection between regenerative compositions structures and regenerative sets 
also goes in the opposite direction. 

Theorem 3.4. Every regenerative composition structure can be derived by the paintbox 
construction from the range of a multiplicative subordinator, whose parameters (z/, d) are 
determined uniquely up to a positive multiple. 
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Sketch of proof Sampling consistency together with the regeneration imply that the first 
n rows of the minor {q{n' : ■), n' < n) are uniquely determined by the last row q{n : ■) via 
formulas 

q{n' : m') = , ^ , l<m'<n', 20 

1 - qo{n' : 0) 

n ( n—m \ / \ 

qQ[n : m ) = >^ g(n : mj -r—r , U < m <n. [21) 

m=l \n') 

Think of x„ as allocation of Fn balls in box labeled B, and n — Fn balls in other boxes. 
Formula (12T!) gives the distribution of the number of balls remaining in B after n — n' 
balls have been removed at random without replacement, with account of the possibility 
m' = that B may become empty. Formula fl2UI) says that the distribution of F^' is the 
same as that of the number of balls which remain in B conditionally given that at least 
one ball remains. This relation of F„ and Fn' is counter-intuitive, because sampling may 
eliminate the first block of Xn completely (equivalently, the first block of K„ may have no 
representatives in [n']). 

Invoking the simplest instance of fl2Ul) . with n' = n — 1, we have 

q(n : m) m + 1 , ^ ^, n + l-m , ^ . 

— q[n + 1 : m + 1) H q[n + 1 : m). 



1 -g(n + 1 : l)/(n + 1) n + ' ^ n + 1 

This is a nonlinear recursion, but passing to formal homogeneous variables ^(n : m) and 
using the substitution q{n : m) = $(n : m)/^{n) with $(n) := Ylm=i "^(^ • results in 
the linear relation 

$(n : m) = ^(n + 1 : m + 1) H + 1 : m). 

^ ^ n+1 ^ ' n+1 ^ ' 

Equivalently, in terms of the iterated differences 

J 2_^{-^y^^ \^ . j ^{n - m + j), 1 < m <n. 

The positivity condition $(n : m) > implies that the sequence {^{n),n > 0) (where 
$(0) = 0) must be completely alternating [5], i.e. its iterated differences have alternating 
signs. The latter also means that the difference sequence ($(?t, + 1) — $(n), n > 0) is 
completely monotone, hence by the famous Hausdorff theorem $(n)'s are representable 
as moments of some finite measure on [0,1]. From this (ITSl) follows for integer values 
of p with some (z/, d). The latter secures (fT8|) for arbitrary p > by the uniqueness of 
interpolation. 

Interestingly, the argument only exploits a recursion on q, hence avoids explicit limit 
transition from x to 71, as one could expect by analogy with Theorem 12.21 See pUfT^ ITTj 
for variations. □ 

We can also view F{t) = 1 — exp(— 5*4) as a random distribution function on M+ and to 
construct a composition by sampling from F, as in the species sampling problem. These 
neutral to the right priors have found applications in Bayesian statistics [34j . 
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Additive paintbox It is sometimes convenient to induce regenerative composition struc- 
tures using a subordinator (St) to create the gaps. Then, independent unit-rate exponen- 
tial variables Ei,E2, . . . should be used in the role of balls, instead of uniform f/j's in the 
multiplicative framework. 

Formula (IT^ can be re-derived by appealing to the potential measure □ of subordina- 
tor. Heuristically, think of n{dy) as of probability to visit location y at some time, and 
of u as distribution of size of a generic jump of the subordinator. The probability that 
the first part m of composition is created by visiting y by a jump of given size z is then 
the product of (1 — e~"^)n(d?/) and (1 — e~^)''"e~^"~'^)^z>(dz). Taking into account the 
formula for Laplace transform of the potential measure [6] 



]|°°(i-e-")n(d,) = J- 



*(p) 

we arrive at (|T9l) by integration. The compensation formula for Poisson processes P, p. 
76] is needed to make this argument rigorous. 

The advantage of working with M+ is that the regeneration property involves no scal- 
ing. A disadvantage is that the asymptotic frequency of balls within the walls (a, b) is 
the exponential probability — e"", as compared to the size of gap in the multiplicative 
representation on [0, 1]. 

In particular, for Ewens' composition structures the subordinator {St) is a compound 
Poisson process with the jump distribution exponential(6'), so the range 71 of (St) is a 
homogeneous Poisson point process with density 6, and TZ is inhomogeneous Poisson point 
process with density 0/{l — x) on [0, 1]. 



Example 3.5. Consider the infinite Levy measure on [0,1] with density z/(dx) 
compute 



X ^(1 — x)^ ^dx. Denoting heln) = Yl^=i e+l-i generalised harmonic numbers we 



m{n - m)\{e)nhe{ny (6')„ j^^he{Kj)' 

This composition structure appears as the limit of stick-breaking compositions structures 
( IT3|) as 7 — 0. Although the CPF looks very similar to Ewens' (fT5|) . there is no simple 
product formula for the associated partition structure, even in the case 9 = 1. 

Example 3.6. (Regenerative hook compositions) Hook composition structures are 
induced by killed pure-drift subordinators with v{dx) = 6i{dx) and d G [0, oo]. They have 
decrement matrices with the only nonzero entries 

g(n : n) = , g(n : 1) 



1 + nd ' ' 1 + nd 

The compositions x„ only assume values like (1, 1, . . . , 1, m). Ferrer's diagrams of the 
associated partitions (m, 1,1, ...,1)-'^, are P-shaped hooks. 

The hook compositions bridge between the pure-singleton composition (with TZ = 
[0, 1]) and the trivial one-block composition (with 71 = {0, 1}). For arbitrary composition 
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structure with some Levy exponent $ we can construct a similar deformation by adding 
atomic component pSi{dx) to the Levy measure; this results in a family of decrement 
matrices 

: //;)+ n(//; = n) 
with the one-block composition appearing in the limit /3 — > oo. 

Sliced splitting We introduce another kind of parametric deformation of a subordinator. 

Let S — (St) be a subordinator with range TZ, and let Xi < X2 < . . . be the points of 
homogeneous Poisson process with density 6. Take S{j), j > 0, to be independent copies 
of S, also independent of the Poisson process. We construct the path of interrupted 
subordinator S^^^ by shifting and glueing pieces of S{jys in one path. 

Run S'(O) until the passage through level Xi at some time Ti, so 5'ti_(0) < Xi < 
S'ti(O). Leave the path of the process south-west of the point (Ti,Xi) as it is, and cut 
the rest north-east part of the path. At time Ti start the process (5'((1) -|- Xi, t > Ti) 
and let it running until passage through X2. Iterate, creating partial paths running from 
{Tj,Xj) to (Tj+i, Xj+i). From the properties of subordinators and Poisson processes, one 
sees that 5'^^^ is indeed a subordinator. 

The range of S^^\ 



7^w :=[j[x„x,^,)n{x,+n{j)), 



can be called sliced splitting. First R_|_ is split at locations Xj, then each gap {Xj,Xj^i) 



is further split at points of 7^(j) H {Xj,Xj^i) where TZ{j) = TZ are i.i.d. 

The range of 1 — cxp(— S'f^'*) can be constructed by a similar fitting in the gaps between 
the points 1 — cxp(— Xj), which are the atoms of a Poisson point process with density 

e/{i-x). 

Denote, as usual, i> the characteristics of S, and ^e^i^e the characteristics of S^^\ 
Then we have 

Mp) = -z-^Hp + O): My, 00] =e-'yi>[y, 00]. (23) 

To see this, a heuristics is helpful to guess the passage rate through exponential level. 

Denote Ep, Eg independent exponential variables with parameters p, 9. The process 5'^^^ 
passes the level E^y within infinitesimal time interval (0, t) when St > mm{Ep, Eq) = Ep. 

The inequality St > mm{Ep, Eg) = Epj^g occurs with probability ^{p + 0)t + o{t), and 
probability of the event Eg < Ep is p/{p + 6). 

The Green matrix For a sequence of compositions x = (x„) which, in principle, need 
not be consistent in any sense we can define g{n,j) as the probability that a '1' stays in 
position j of the binary code of x„. That is to say, g{n, j) is the probability that the parts 
of Xn satisfy Ai + . . . + Aj_i — j — 1 for some i > 1. Call {g{n,j), 1 < j < n,n E N) the 
Green matrix of x. For x a regenerative composition structure, g{n,j) is the probability 
that the Markov chain Q^^ ever visits state n + 1 — j, and we have an explicit formula in 
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terms of the Laplace exponent (see [211) 



1) ^ *o) (;) I (^T^)^. (24) 

3.4 Regenerative compositions from the two-parameter family 

Let TT be the two-parameter partition structure with PPF (jlj). Sometimes notation 
PD(a,^) is used for the law of frequencies S, where PD stands for Poisson-Dirichlet, 
and sometimes this law is called Pitman- Yor prior after |36]. Formulas for PD(a,^) are 
difficult, but the sequence of frequencies in size-biased order can be obtained by inhomo- 
geneous stick-breaking scheme ([9]) with Wj = beta(l — a,6 + ja). 

We will see that for < a < 1 and 6 > and only for these values of the parameters 
the parts of vr can be arranged in a regenerative composition structure. 

Define a (multiplicative) Levy measure u on [0, 1] by the formula for its right tail 

iy[x,l]=x-'^{l-xf. (25) 

The density of this measure is a mixture of two beta-type densities, and in the case 6 = 
there is a unit atom at 1. The associated Laplace exponent is 

t(p)=pB(l-a,p + .) = §lffffl^^, (26) 

L [p + I - a + U) 

and the binomial moments are 

$(n : m) = ^ ^ (aB(m — a,n — m+l + 9) + 9B{m + 1 — a, n — m + 9)) , 

so there exists a regenerative composition structure x with the decrement matrix 

$(?2 : m) f n\ {1 - a)m-i {{n-m)a + m9) 

q\n : m = — — = . 27 

$(n) \mj{9 + n-m)m n 

It is a good exercise in algebra to show that the symmetrisation ([3]) of the product-form 
CPF with decrement matrix (p7|) is indeed the two-parameter PPF (jlj). 

Like their unordered counterparts, the two-parameter regenerative compositions have 
many interesting features. Three subfamilies are of special interest and, as the experience 
shows, should be always analysed first. 

Case (0, 9) for > 0. This is the ESF case (fT5|) . with z/ being the beta(l, 9) distribution. 
The blocks of composition appear in the size-biased order, the gaps of IZ^ too. 

Case (a, 0) for < a < 1. In this case 

z/(dx) = + 5i(dx) 

is an infinite measure with a unit atom at 1. The composition structure is directed by 
7?. = 2^ n [0, 1], where Z is the range of stable subordinator. On the other hand, TZ can 
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be also obtained as the range of multiplicative subordinator (1 — exp{—St),t > 0), where 
(St) is the subordinator with Levy measure 

The product formula specialises to 



k 



l-a) 



A,! 



This composition structure was introduced in [H], where Z was realised as the zero set 
of a Bessel process of dimension 2 — 2a. For a = 1/2 this is the zero set of BM. 

The decrement matrix q in this case has the special property that there is a probability 
distribution h on the positive integers such that 



n-l 

q{n : m) = h{ra) li ni < n and q{n : n) = 1 — h{m). (28) 

This means that I's in the binary code of x„ can be identified with the set of sites within 
1, . . . , n visited by a positive random walk on integers (discrete renewal process), with the 
initial state 1. Specifically, 

Km) = (29) 
ml 

and q{n : n) = (1 — a)„_i/(n — 1)!. 

The arrangement of parts of 7r„ in a composition is obtained by placing a size-biased 
part of Tin in the last position in x„, then by shuffling the remaining parts uniformly at 
random to occupy all other positions. Exactly the same rule applies on the paintbox level: 
for S following PD(a,0), a term is chosen by the size-biased pick and attached to 1 as 
the meander, then the remaining gaps are arranged in the exchangeable order. 

Case {a, a) for < a < 1. The associated regenerative set has zero drift and the Levy 
measure 

u{dy) = a{l-e-yy-^e-''ydy y>0, 

this is the zero set of an Ornstein-Uhlenbeck process. The corresponding range of mul- 
tiplicative subordinator can be realised as the zero set of a Bessel bridge of dimension 
2 — 2a; in the case a = 1/2 this is the Brownian bridge. 

The parts of x„ are identifiable with the increments of a random walk with the same 
step distribution h as in fl2Ul) for the (a, 0) case, but now conditioned on visiting the state 
n + 1. The CPF is 

k ^ 

K,.(Ai, . . . , A,) = r (A°)y^ 1[{1 - a)A,-i. (30) 

This function is symmetric for each k, which implies that the parts of each x„ are in the 
exchangeable random order. This confirms the known fact that the excursion intervals of 
a Bessel bridge appear in exchangeable order. 
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Due to symmetry, the transition rule from x„ to is a simple variation of the 

Chinese restaurant scheme. Now the tables are ordered in a row. Given x„ = (Ai, . . . , A^), 
customer n + 1 is placed at one of the existing tables with chance {Xj — a)/{n + a) as 
usual, and when a new table is to be occupied, this table is placed with equal probability 
to the right, to the left or in-between any two of k tables occupied so far. 

In the case {a, 0), there is a right meander appearing in consequence of killing at rate 
z>{oo} = 1. Removing the atom at oo yields another m-regenerative set (not in the two- 
parameter family) obtained by (i) splitting [0, 1] using beta(l, 6) stick-breaking, (ii) fitting 
in each gap (l^-i, Yj) a scaled copy of the {a, 0) m-regenerative set. The decrement matrix 
is fl22|) . with $ like for the (a,0) m-regenerative set and (3 = —1. A dicrete counterpart, 
, IS a path of a random walk with reflection, but CPF has no simple formula. 

The m-regenerative set with parameters O<a<l,0>Ois constructable from the 
sets (0, 9) and (a, 0) by shced splitting. To define a multiplicative version of the two-level 
paintbox, to have a relation like ( l23l) . first split [0, 1] at points Yi of the Poisson process 
with density 6/{l — x) as in the Ewens case, (recall that this is the same as stick-breaking 
with beta(l,^) factor W). Then for each j choose an independent copy of the a-stable 
regenerative set starting at Yj_i and abrupted at Yj, and use this copy to split (^-i, Yj). 

The resulting m-regenerative set corresponds to the (a, 9) composition structure, so 
( 125]) becomes 

p + 9 

which is trivial to check. As another check, observe that the structural distribution 
beta(l — a,a + 9) is the Mellin convolution of beta(l, 9) and beta(l — a,a), as it must 
be for the two-level splitting scheme. 

The construction is literally the same on the level of finite compositions. First a 
regenerative Ewens (0, 9) composition of n is constructed, then each part is independently 
split in a sequence of parts according to the rules of the regenerative {a, 0)-composition. 

The arrangement problem for general {a, 9) was settled recently in [44j. Note that 
every sequence ri, r2, . . . of initial ranks rj G [j] defines uniquely a total order on N, by 
placing j in position rj relatively to 1, . . . ,j. For instance, the initial ranks 1, 2, 1, 3, . . . 
encode a total order in which the arrangement of set [4] is 3 1 4 2 (1 is ranked 1 within 
[1], then 2 is ranked 1 within [2], then 3 is ranked 1 within [3], then 4 is ranked 3 within 
[4], . . .). For rj G [0, oo], consider a probability distribution for (ri,r2, . . .) under which 
Tj's are independent, the probability of r^ = j is rj/ (r/ + j) and the probability of rj = i is 

l/{ri + j) for every i < j. Pitman and Winkel [H] show that to arrange S = PD( a, 9) in 
regenerative paintbox one should (i) first label the frequencies in the size-biased order, (ii) 
then, independently, arrange the collection of frequencies by applying the arrangement 
to the lebels, with parameter rj = 9 /a. For a = 0, the frequencies will be arranged in 
the size-biased order (because for t] = oo the relative ranks are r^ = j a.s.); for a = 9 
this is an exchangeable arrangement of S; and for 9 = the arrangement is as for [a, 0) 
partition described above. 

The arrangement of blocks of 7r„ in regenerative composition x„ is analogous, for each 
n. See [18] for this and larger classes of distributions on permutations, their sufficiency 
properties and connections to the generalised ESF. 
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4 Regenerative partition structures and the problem 
of arrangement 

We discuss next connections between regenerative composition structures and their asso- 
ciated partition structures. One important issue is the uniqueness of the correspondence. 

4.1 Structural distributions 

For 7^^ related to S via ([2]) let P be the size of the gap covering the uniform point Ui, with 
the convention that P = in the event Ui G TZ. We shall understand P as a size-biased 
pick from S, this agrees with the (unambiguous) definition in the proper case and extends 
it when the sum of positive frequencies may be less than 1. Obviously, the particular 
choice of TZ with gap-sizes S is not important. 

The law of P is known as the structural distribution of S. Most properties of this distri- 
bution readily follow from the fact that it is a mixture of discrete measures SjSs^idx) + 

^1 — Sj^ 6o{dx). In particular, the {n — l)st moment of P is the probability that x„ 
is the trivial one-block composition (n) or, what is the same, that vr^ = (n)^: 



In general, there can be many partition structures which share the same structural 
distribution, but for the regenerative composition structures the correspondence is one- 
to-one. Indeed, we have 



which shows that the moments sequence {p{n),n G N) determines {^{n), n ^N) uniquely 
up to a positive multiple, hence determines the decrement matrix q. Explicit expressions 
of the entries of q through the J9(n)'s are complicated, these are some rational functions in 
p(n)'s, for instance g(3 : 2) = (2p(2) — 3p(3) + p{2)p{3)) / {1 —p{2)). Because the moments 
p{n) are determined by the sizes of gaps and not by their arrangement, we conclude that 

Theorem 4.1. Each partition structure corresponds to at most one regenerative com- 
position structure. Equivalently, for random frequencies S there exists at most one distri- 
bution for a m-regenerative set TZ with {TZ'^)^ = S. 

In principle, one can determine if some PPF p corresponds to a regenerative CPF vr 
by computing q formally from the one-block probabilities p(n)'s, then checking positivity 
of q, and if it is positive then comparing the symmetrised PPF (JHI) corresponding to q 



p{n) = E[P"-i]. 




With some algebra a recursion for the Laplace exponent follows 
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with p. This method works smoothly in the two-parameter case. For the {a, 6) partition 
structures the structural distribution is beta(l — a,9 + a) and 



see Pitman |13]. Computing formally q from p{n)^s we arrive at q coinciding with fl27l) . 
However, a decrement matrix must be nonnegative, which is not the case for some values 
of the parameters: 

Theorem 4.2. Every Ewens-Pitman partition structure with parameters in the range 
0<a<l, 6'>0 has a unique arrangement as a regenerative composition structure. For 
other values of the parameters such arrangement does not exist. 

Actually, it is evident that any partition structure of the 'discrete series' with a < in ([5]) 
cannot be regenerative just because the number of parts in each 7r„ is bounded by —9/a. 



4.2 Partition structures invariant under deletion of a part 

Recalling ([7]), ([H]), partition structures inherit a deletion property from the parent regen- 
erative compositions. In this section we discuss the reverse side of this connection, which 
puts the regeneration property in the new light. The main idea is that if a partition 
structure vr has a part-deletion property, then the iterated deletion creates order in a 
way consistent for all n, thus canonically associating with vr a regenerative composition 
structure. 

Let TT be a partition structure. A random part of 7r„ is an integer random variable P„ 
which satisfies P„ G 7r„. The joint distribution of 7r„ and P„ is determined by the PPF 
and some deletion kernel d{X^,m), which specifies the conditional distribution of P„ given 
partition 7r„ 

p(A^)d(A,m) = P(7r„ = A^P„ = m), \X^\=n. (31) 
For each n = 1,2,... the distribution of P„ is then 

q{n : m) = P(P„ = m)= d{X^,m)p\X^), 1 < m < n. (32) 

{Ai:|Ai|=n, mSAi} 

The formulas differ from ([7]) and ([8]) in that now they refer to some abstract 'random 
part' P„ of unordered structure. The requirement that P„ is a part of 7r„ makes 

J2 d{X^,m) = l. 

distinct m S A^ 

Definition 4.3. Call a partition structure vr = (tt^) regenerative if, for each n, there 
exists a joint distribution for 7r„ and its random part Pn such that for each 1 < m < n 
conditionally given Pn = m the remaining partition tt^, \ {m} of n — m has the same 
distribution as T^n-m- Call tt regenerative w.r.t. d if the conditional distribution of P„ is 
specified by d as in dH]), for each n. Call n regenerative w.r.t. q if q{n : •) is the law of 
Pn, which means that 

p{X^)d{X^ ,m) = q{n : m)p{X^ \ {m}), n = 1, 2, ... . (33) 
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Example (Hook partition structures) This is a continuation of Example I3.6[ Call a 
hook partition if only Ai may be larger than 1, for instance (4, 1, 1, l)-*-. For every deletion 
kernel with the property 

rf(A^l) = 1 if 1 G A^ 

it can be shown that the only partition structures regenerative w.r.t. such d are those 
supported by hook partitions, and they have g(?T, : n) = l/(H-?T,d), q{n : 1) = nd/{l+nd) 
for some d G [0, oo]. A regenerative partition structure is of the hook type if and only if 
p((2,2)i) = 0. 

Theorem 4.4. If a partition structure is regenerative and satisfies p{{2, 2)^) > then 
q uniquely determines p and d, and p uniquely determines q and d. Equivalently, if a 
regenerative partition structure is not of the hook type then the corresponding deletion 
kernel is unique. 

4.3 Deletion kernels of the two- parameter family 

For Ewens' composition structures f|T5l) the deletion kernel is the size-biased pick 

(io(A-'',m) = — , where km = {] ■ A,- = m}, n = lAI. 

n 

The factor k^ appears since the kernel specifies the chance to choose one of the parts 
of given size m, rather than a particular part of size m. The regeneration of Ewens' 
partition structures under this deletion operation was observed by Kingman [37] and 
called non-interference, in a species sampling context. Kingman also showed that this 
deletion property is characteristic: if a partition structure is regenerative w.r.t. do, then 
the PPF is the the ESF with some 6 G [0, oo]. 

For the regenerative composition structures of the two-parameter family (with non- 
negative a, 9) the deletion kernel is one of 

, / , I N km (n — rn]T + mil — r) , , , , 

where k = Ylm ^"i and n = \X^. Kingman's characterisation of the ESF is a special case 
of a more general result (see Gnedin and Pitman [25 j ) : 

Theorem 4.5. Fix r G [0, 1]. The only partition structures that are regenerative w.r.t. 
deletion kernel d^- are the (a, 6) partition structures with 

0<a<l, ^>0 and a/{e + a)=T. 

Summarising, three subfamilies are characterised by: 

1. The kernel d^ is the size-biased choice; only (0, 9) partition structures are regenera- 
tive w.r.t. do- 

2. The kernel dij2 is a uniform random choice of a part; only (a, a) partition structures 
are regenerative w.r.t. di/2 ■ 
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3. The kernel di can be called cosize-biased deletion, as each (particular) part m G 
A-*- is selected with probability proportional to \\^\ — m; only (a, 0) partitions are 
regenerative w.r.t. di. 

For general r, the kernel is intrinsically related to the Pitman- Winkel arrangement of 
blocks with ( = — 1, see Section [331 

4.4 Back to regenerative compositions 

The framework of regenerative partitions suggests to study three objects: the PPF p, the 
deletion kernel d and the distribution of deleted part q. Naively, it might seem that d, 
which tells us how a part is deleted, is the right object to start with, like in Kingman's 
characterisation of the ESF via the size-biased deletion. However, apart from the deletion 
kernels d^. for the two-parameter family, and kernels related to hook partitions we do not 
know examples where the approach based on the kernels could be made explicit. Strangely 
enough, to understand the regeneration mechanism for partitions, one should ignore for 
a while the question how a part is deleted, and only focus on q which tells us what is 
deleted. 

Fix n and let g(n : ■) be an arbitrary distribution on [n]. Consider a Markov q{n : ■)- 
chain on the set of partitions of n by which a partition A-'^ (thought of as allocation of 
balls in boxes) is transformed by the rules: 

• choose a value of P„ from the distribution q{n : ■), 

• given Pn = m sample without replacement m balls and discard the boxes becoming 
empty, 

• put these m balls in a newly created box. 

Similarly, define a Markov q{n : ■)-chain on compositions A° of n with the only difference 
that the newly created box is placed in the first position. Obviously, the q{n : ■)-chain 
on compositions projects to the q{n : ■)-chain on partitions when the order of boxes is 
discarded. 

Lemma 4.6. //dM]) holds for some fixed n and distribution q[n : ■) then the law of Hn 
is a stationary distribution for the q{n : ■)-chain on partitions. 

Sketch of proof The condition (133!) may be written as a stochastic fixed-point equation 

TTn \ {Pn} = T^n-P„, 

where (^„',1 < n' < n) is a sequence of random partitions, independent of P„, with 
TTn. = 7r„. The lemma follows since then vrn-p„ U {P„} = 7r„. □ 

There is an obvious parallel assertion about a random composition x„, which satisfies 

Xn \ {Fn} = ^n-F„, 

where \ stands for the deletion of the first part F„ with distribution q{n : ■). 
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Lemma 4.7. The unique stationary distribution of the q{n : ■)-chain on compositions 
is the one by which x„ follows the product formula for 1 < n' < n with q{n' : ■) given by 
(!20|) . Symmetrisation of the law of by ([3]) gives the unique stationary distribution of 
the q{n : ■)-chain on partitions. 

It follows that if fl33|) holds for some n then it holds for all n' < n, with all p{X^), d{X^, ■) 
for I A-'- 1 = n' uniquely determined by q{n : ■) via sampling consistency. Thus, in principle, 
for partitions of n' < n the regeneration property is uniquely determined by arbitrary 
discrete distribution q{n : ■) through the following steps: find first {q{n' : ■),n' < n) from 
sampling consistency (!20|) . then use the product formula for compositions ([6]), then the 
symmetrisation ([3]). With all this at hand, the deletion kernel can be determined from 
(pTl) . Letting n vary, the sampling consistency of all q{n : ■)'s implies that g is a decrement 
matrix of a regenerative composition structure. 

Starting with 7r„, the deletion kernel determines a Markov chain on subpartitions of 7r„. 
A part Pn is chosen according to the kernel d and deleted, from the remaining partition 
7r„ \ {Pn} another part is chosen according to d etc. This brings the parts of 7r„ in the 
deletion order. 

Theorem 4.8. Suppose a partition structure tt = (vr^) is regenerative w.r.t. q, then 

(i) q is a decrement matrix of some regenerative composition structure k, 

(ii) 77 is the symmetrisation of x, 

(iii) X is obtained from tt by arranging, for each n, the parts of TCn in the deletion order. 

Thus the regeneration concepts for partition and composition structures coincide. It is 
not clear, however, how to formulate the regeneration property in terms of the unordered 
frequencies S. The only obvious way is to compute PPF and then check if the PPF 
corresponds to a regenerative CPF. Moreover, the deletion kernel may have no well- 
defined continuous analogue. For instance, in the {a, a) case di/2 is a uniform random 
choice of a part from iin, but what is a 'random choice of a term' from the infinite random 
series S under PD(a,a)? 

4.5 More on {a, a) compositions: reversibility 

We have seen that the (a, a) composition structures are the only regenerative compositions 
which have parts in the exchangeable order. We show now that these structures can be 
characterised by some weaker properties of reversibility. 

Every composition structure x has a dual x, where each x„ is the sequence of parts of 
x„ read in the right-to-left order. For example, the value (3, 2) of X5 corresponds to the 
value (2, 3) of X5. If x is derived from TZ, then x is derived from the reflected paintbox 
1 — TZ. If both X and x are regenerative then by the uniqueness (Theorem 14.11) they must 

have the same distribution. If x is reversible, i.e. x = x, then the first part of x„ must 
have the same distribution as its last part. 

Theorem 4.9. Let x be a regenerative composition structure. Let F„ denote the first 
and L„ the last part of Xn. The following conditions are equivalent: 
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(i) P(F„ = 1) = P(L„ = 1) for all n; 

(ii) Fn = Ln for all n; 

(iii) Xn = Scn for all n (reversibility), 

(v) X is an {a, a) -composition structure with some < a < 1. 
Sketch of proof Some manipulations with finite differences yield 

$(n) - ^(n - 1) 
P F„ = 1 = q{n : 1 = ^ ' ) , P L„ = 1 = ^ 



k=2 



k-lj $(A;) 



Equating these probabilities, one arrives at = + — 1)! where $(2) : = 

1 + a and the normalisation $(1) = 1 is assumed. The latter is the Laplace exponent 
corresponding to the (a, a) composition. □ 

Invoking the paintbox correspondence, the result implies 

Corollary 4.10. For a random closed subset IZ of [0, 1], the following two conditions 
are equivalent: 

(i) TZ is m- regenerative and IZ = 1 — IZ. 

(ii) IZ is distributed like the zero set of a Bessel bridge of dimension 2 — 2a, for some 
< a < 1. 

The degenerate boundary cases with a = or 1 are defined by continuity. 

5 Self-similarity and stationarity 
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Self-similarity of a random closed set Z C M+ is the condition cZ = Z, c > 0. The 
property is a multiplicative analogue of the stationarity property (translation invariance) 
of a random subset of M, as familiar from the elementary renewal theory (see [3S] for a 
general account). We encountered self-similarity in connection with paintboxes for (q;,0) 
compositions. 

Regenerative (0, 9) compositions can be also embedded in the self-similar framework 
by passing to duals. The mirrored paintbox for the dual Ewens' composition structure is 

the stick-breaking set TZ = {Vi ■ ■ - Vi, i = 0,1, . . .} with i.i.d. Vi = beta(6', 1). This set is 
the restriction to [0, 1] of a self-similar Poisson point process with density 9/y, y > 0. 

Introduce the operation of right reduction as cutting the last symbol of the binary 
code of composition. For instance, the right reduction maps 100110 to 10011. 

Definition 5.1. A sequence of random compositions x = (x„) is called right- consistent 
if the right reduction maps >in+i in a stochastic copy of x„. If x is a composition structure, 
we call it self-similar if it is right-consistent. 
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If a sequence of compositions x = (x„) is right-consistent, it can be realised on the 
same probabihty space as a single infinite random binary string 771, 772, . . ., with being 
the composition encoded in the first n digits rji, . . . ,T]n. For right-consistent x the Green 
matrix is of the form 

g{n,j)=nr], = l), l<j<n, n = l,2,... 
and we shall simply write g{j)- 

Theorem 5.2. A composition structure x is self-similar iff the paintbox TZ is the 
restriction to [0, 1] of a selfsimilar set Z. In this case x can he encoded in an infinite 
binary string. 

Sketch of proof The 'if part is easily shown using the modified sampling scheme, as in 
the BM example. The 'only if part exploits convergence of random sets as in Theorem 
[O □ 

Arbitrary infinite binary string ?7i,?72; • • • (starting from 1) need not correspond to a 
composition structure, because care of the sampling consistency should be taken. Let us 
review the (0, 9) and (a, 0) compositions from this standpoint. 

Example. For > let r^i, r72, . . . be a Bernoulli string with independent digits and 

g{j) = = 1) = 

This encodes the dual Ewens' composition structure, with the last-part deletion property. 
In the modified sampling scheme, the role of balls is taken by a homogeneous Poisson 
point process, and the boxes are created by points of an independent self-similar Poisson 
process. 

The family of composition structures can be included in a Markov process with ^ > 
considered as a continuous time parameter |28]. On the level of paintboxes the dynamics 
amounts to intensifying Poisson processes, so that within time dO the Poisson process 
Z = Zq is superimposed with another independent Poisson process with density d/x. This 
is an instance of sliced splitting, so fl2^ is in force. From this viewpoint a special feature 
is that the 6'-splitting are consistently defined, also in terms of interrupted subordinators, 
which are here compound Poisson processes with exponential jumps. 

Remarkably, the splitting process remains Markovian in terms of the binary codes, and 
has the dynamics in which every '0' eventually turns in '1' by the rule: at time 6, a '0' in 
the generic position j of the code is switching at rate 1/(6* + j — 1) to a '1', independently 
of digits in all other positions. 

Example. For a G (0, 1) let (T^) be a discrete renewal process with To = 1 and indepen- 
dent increments with distribution 



P(Tfc+i - Tfc = m) = (-1) 



m—l 



(the case a = 1/2 is related to the recurrence time of a standard random walk). For 
r]j = l{r)k>o{Tk = j}) the sequence 771,772, • • • encodes the regenerative (a, 0) composition 
structure. The Green matrix is g{j) = {a)n-j/{n — j)!. 
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It is known [17] that no other Bernoulh or renewal strings are samphng consistent, i.e. 
produce composition structures. We shall turn to a larger class of strings with a Markov 
property, but first review a few general features of the self-similar compositions. 

Let Pn be the size-biased pick from xr„, and L„ be the last part of the composition. 
Similarly, let P be the size-biased gap-length of TZ, and L be the size of the meander gap 
adjacent to 1. 

Theorem 5.3. Let x be a self-similar composition structure, thus derived from some 
self-similar set Z. Then 

(i) P = L, and (ii) Pn = L^, 
and the Green matrix is g{j) = E(l — Py~^. 

Sketch of proof Since reducing the last box by one ball has the same effect as reducing the 
box chosen by the size-biased pick, the sizes of the boxes must have the same distribution. 
This yields (ii), and (i) follows as n ^ oo. Alternatively, inspecting the gap covering Un-.n 
it is seen that E[L"~^] = p°{n), the probability of one-block composition, so the moments 
of P and L coincide. Similarly, rjj = 1 in the event Un;i > max(Z fl [0, Un-.j])- □ 



The identity (ii) together with a generalisation of a result by Pitman and Yor [45] 
yields a characterisation of structural distributions, and shows that P has a decreasing 
density on (0, 1]. 

Theorem 5.4. [26] The structural distribution for self-similar composition structure 
is of the form 

nP e d,T) = —^^^ r da; + -^6o{dx), x G [0, 1], (35) 

[a -\- m)[l — X) d + m 

where d > and u is a measure on (0, 1] with 

m := / I log(l — a:;)|z/(da;) < oo. 
Jo 

There is no atom at iff d = iff has Lebesgue measure zero. 



5.1 Markovian composition structures 

For a time being we switch to regeneration in the right-to-left order of parts, starting from 
the last part, like for the dual Ewens' composition. This is more convenient in the self- 
similar context since is the center of homothety. We first modify the deletion property 
of compositions by allowing a special distribution for the first deleted part (which is now 
the last part of the composition). 

Definition 5.5. A composition structure is called Markovian if the CPF is of the 

product form 
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fc-1 

p°(A°) = gW(n : A^) J]g(A, : A,), A, = Ai + . . . + A,. (36) 
i=i 

where g'-"^ and q are two decrement matrices. 

Similarly to ([6]), formula fl36l) says that I's in the binary code of x appear at sites Ql^{t) + 1 
visited by a decreasing Markov chain, with the only new feature that the the distribution 
of the first decrement is determined by q'^'^\ and not by q. 

The counterpart of Theorem 13.41 for fl5^ is straightforward. For [St) a subordina- 
tor, consider the process {V ■ exp(— S't), t > 0), where V takes values in (0,1) and is 
independent of (St)- The range of this process is a m-regenerative set (now with right- 
to-left regeneration) scaled by the random factor V. Taking this set for paintbox TZ, 
thus with the meander gap [V,l], a Markovian composition structure is induced with 
q{n : m) = $(?2 : m)/$(n) as in (IT^ . and 

g(o)(n : m) = ¥°\n : 0)q(n : m) + <l>^°\n : m), <^^°\n : m) := ( ^]E{V''-"'n -Vr}. 

\m J 

Every Markovian composition structure is of this form. 



5.2 Self-similar Markov composition structures 

Let = {Q\t), t = 0, 1, . . .) be a time-homogeneous increasing Markov chain on N with 
(0) = 1. An infinite string 771,772, ... is defined as the sequence of sites visited by 

rjj = l(Q^(t) = j for some t). 

If the string determines some composition structure x, then x is self-similar. A composi- 
tion structure is called self-similar Markov if it has such a binary representation generated 
by an increasing Markov chain. 

A stationary regenerative set (or stationary Markov [3S]) is the range of a process 
{X + St, t > 0) where {St) is a finite mean-subordinator, with Levy measure satisfying 

POO 

m = / yuidy) < 00, 
^0 

drift d > and the initial value X whose distribution is 

nXedy) = ^^dy+-^5,idy) 
d + m d + m 

(unlike u in fl35|) z> lives on (0, 00)). 

Theorem 5.6. [26J A composition structure x is self-similar Markov if and only if 
TZ = exp{—TZ), where TZ is a stationary regenerative set. 
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The distribution of size-biased pick is then fl35l) with v the image of v under y ^\—e ^ . 
The Green matrix can be written in terms of the Laplace exponent 

1 $f7-l) 

^(^■) = 7T forj>l, g{\) = \. 

d + m J — 1 

The relation beween this and (!24|) is that the RHS of (12^ converges to (^(j) as n — >■ oo. 
This fact is analogous to the elementary renewal theorem. 

Like in the regenerative case, the decrement matrices are determined, in principle, by 
the probabilities {p{n),n > 0), which are moments of the structural distribution, whence 
the analogue of Theorem I4.lt 

Theorem 5.7. If a partition structure admits arrangement as a self-similar Markov 
composition structure, then such arrangement is unique in distribution. 

Application to the two-parameter family For < a < 1 and > let TZa,e be the m- 
regenerative set associated with (a, 9) regenerative composition structure, and let V be an 
independent variable whose distribution is beta(6' + a, 1 — a). Then the scaled reflected set 
V -{l — Tla^e) is associated with a self-similar Markov composition structure corresponding 
to {a, 9 — a) partition struture. This follows from the stick-breaking representation of the 
frequencies in size-biased order, with independent factors beta(6'-|- ja, 1 — j = 1,2,.... 
The Green function g and transition probabilities for can be readily computed. 

A 'stationary' version of the regenerative {a, 9) composition is the self-similar Markov 
arrangement of the {a, 9 — a) partition. The structural distribution is beta(l — a, 6' + a), 
which is also the law of the meander size 1 — V . Note that 9 — a may assume negative 
values, hence every partition with 9 > —a has a self-similar Markov arrangement. This 
'rehabilitates' {a, 9) partitions with —a < 9 < that lack regeneration literally, the 
property appears in a modified form, as stationary regeneration. If ^ > then both types 
of regeneration are valic|f|. 

The (a, 0) composition with left-to-right regeneration is also self-similar Markov, i.e. 
has the 'stationary' right-to-left regeneration property. This combination of regeneration 
properties is characteristic for this class. 

For the [a, a) partition structure there exists a regenerative arrangement associated 
with Bessel bridge, and there is another self-similar Markov arrangement. The latter is 
the self-similar version of the regenerative (a, 2a) composition. 

The arrangement of {a, 9) partition in a self-similar Markov composition structure is 
the same on both paintbox and finite— n level. The size-biased pick is placed at the end, 
then the rest parts are arranged to the left of it as for the dual {a,9 + a) regenerative 
structure, see Section 13. 4[ Property (i) in Theorem 15.31 holds in the strong sense: condi- 
tionally given the unordered frequencies S, the length of the meander is a size-biased pick 
(see 

^ For 'discrete series' of the parameter values, with a < 0, no regeneration property can exist, simply 
because the paintbox has uniformly bounded cardinality. 
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6 Asymptotics of the block counts 



For X = a regenerative composition structure, let Kn be the number of parts in x„ 
and let Kn^r be the number of parts equal r, so that Ylir^^n,r = n, J2r^r = Kn- For 
instance, in the event xio = (2, 4, 2, 1, 1) we have Kiq = 5, -ft'lo,! = 2, Kio^2 = 2, -ft'io.s = 
etc. The full vector {Kn,i, ■ ■ ■ , Kn,n) is one of the ways to record the partition associated 
with x„. In the species sampling context, Kn is the number of distinct species represented 
in a sample, hence it is often considered as a measure of diversity. 

We are interested in the large-n asymptotics of and K^^r for r = 1, 2, . . .. This can 
be called the small-blocks problem. Typically the composition will have a relatively few 
number of large parts of size of order n and many parts of size r n, the latter making 
the principal contribution to Kn- 

Unless indicated otherwise, we assume that d = (proper case, no drift) and that 
i/{oo} = (no killing, no right meander). Then the order of growth of Kn is sublinear, 
Kn <^ n, and Kn t oo almost surely. 

One general tool is the structural distribution a of the size-biased pick P, which can 
be used to compute the expectations via 



It is clear from these formulas that the asymptotics of the moments are determined by 
the behaviour of a near 0, because (1 — x)" decays exponentially fast on any interval [e, 1]. 

The block counts Kn, Kn^r depend only on the partition, and not on the order of 
the parts. Nevertheless, the Markovian character of regenerative compositions and the 
connection with subordinators can be efficiently exploited to study these functionals by 
methods of the renewal theory. This may be compared with other classes of partitions 
studied with the help of local limit theorems: partitions obtained by conditioning ran- 
dom sums of independent integer variables |2], and partitions derived from conditioned 
subordinators [12]. 

For Ewens' partitions it is well known that Kn is asymptotically normal, with both 
mean and variance of the order of logn (see [21 S3]). In contrast to that, for {a, 6) 
partitions with a > the right scale for Kn is n" (a-diversity [13]). These known facts 
will be embedded in a much more general consideration. 

The number of parts satisfies a distributional fixed-point equation 



where K'^,m < n — 1, are independent of the first part F„ with distribution q{n : ■), and 

satisfy K'^ = Km- Known asymptotics (e.g. [39], [11]) derived from such identities do not 
cover the full range of possibilities and require very restrictive moment conditions which 
are not easy to provide (see however [30j for one application of this approach). In what 
follows we report on the asymptotics which were obtained by different methods, based 
on the connection with subordinators, poissonisation, methods of the renewal theory, and 
Melhn transform [291 [30l [3l [2Q] . 

We assume as before the paintbox construction with balls Ui, - - - ,Un and 71 the closed 
range of a multiplicative subordinator (1 — exp{—St),t > 0). In these terms, Kn^r is the 




Kn = l + K 



n-F, 
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number of gaps in the range hit by exactly r out of n uniform points, and Kn is the total 
number of nonempty gaps. 

Remark If the subordinator has positive drift d > 0, then Kn ~ -ft'n,i ~ ?7.meas(7?.) a.s., 
so singletons make a leading contribution to Kn- The Lebesgue measure of 7^ is a random 
variable proportional to the exponential functional of the subordinator, 

POO 

meas(7^) = d / exp(— S'i)dt. 
Jo 

It is informative to consider the number of parts Kn as the terminal value of the 
increasing process /C„ := (/C„(t),t > 0), where JCnit) is the number of parts of the sub- 
composition derived from the configuration of uniform points not exceeding 1 — exp(— S'J, 
i.e. produced by the subordinator within the time [0,t]. The number of r-parts Kn^r is 
the terminal value of another process ]Cn,r '■= iKn,rii)ii > 0) which counts r-parts, but 
this process is not monotone. 

We can think of the subordinator representation of a regenerative composition struc- 
ture as a coagulation process in which, if at time t there are n' particles, every m-tuple 
of them is merging to form a single particle at rate $(n' : m). The particle emerging 
from the coalescence is immediately frozeij^. Starting with n particles, fCn{t) counts the 
number of frozen particles at time t. 

The asymptotics in the small-block problem largely depend on the behaviour of the 
right tail of the Levy measure near 0. If i) is finite, then simply oo] i/[0, oo] as 
?/ — s> 0, but if z/ is infinite it seems difficult if at all possible to make any conclusions 
without the following assumption. 

Assumption of regular variation We shall suppose that z/ satisfies the condition of 
regular variation 

z>[l/,oo]~£(l/|/)r" Z/iO, (37) 

where the index satisfies < « < 1 and £ is a function of slow variation at oo, i.e. ^ 
satisfies E{t / y) / / y) ^ 1 as ?/ ^ for alH > 0. 

Note that the assumption is satisfied in the case of finite z/. By the monotone den- 
sity version of Karamata's Tauberian theorem [9], for < a < 1 the condition (l37Il is 
equivalent to the asymptotics of the Laplace exponent 

<l>(p) ~ r(l -a)p"£(p), p^oo. 

Qualitatively different asymptotics are possible. Very roughly, the whole spectrum can 
be divided in the following cases, each requiring separate analysis. 

• The finite Levy measure case. This is the case of stick-breaking compositions, with 
{St) a compound Poisson process. 

"'The dynamics is analogous to that of Pitman-Sagitov A-coalescents, with the difference that in the 
A-coalescents the mergers remain active and keep coagulating with other existing particles [40]. For a 
class of A-coalescents a coupling with compositions was used to explore asymptotics of the coalescent 
processes [20] . 
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• The slow variation case with a = and l{y) —>■ oo as y ^ oo. Typical example: 
regenerative compositions associated with gamma subordinators. 

• The proper regular variation case with < a < 1. Typical example: composition 
associated with a-stable subordinator (with < a < 1). 

One principal difference between the cases of (proper) regular and slow variation is in 
the time scales at which major growth and variability of /C^ occur. In the case a > all 
/C„(t), )Cn,r{t) are of the same order as Kn, whereas in the case a = we have /Cn(t) ^ Kn- 

6.1 Stick-breaking compositions 

In the case of finite Levy measure we scale z/ to a probability measure. Then u is the 
distribution of — log(l — W), where W is the generic stick-breaking factor. Introduce the 
moments 

m := E[-log(l - 1^)], (T^ := Var[log(l - ly)], mi := E[- log 1^], (38) 

which may be finite or infinite. 

Let Mn be the index of the rightmost occupied gap, which contains the maximum order 
statistic Un:n- Roughly speaking, stick-breaking implies a fast exponential decay of the 
sizes of gaps, hence one can anticipate a cutoff phenomenon: empty gaps can occur only in 
a range close to Un-.n- From the extreme- value theory we know that — log(l — M„) — logn 
has a limit distribution of the Gumbel type, thus can be approximated by the number 
of jumps of (St) before crossing level log?7.. 

It should be noted that exponential decay of nonrandom frequencies, like for the 
geometric distribution, imphes oscillatory asymptotics in the occupancy problem [10], 
[1]. By stick-breaking the oscillations do not appear since the main variability comes 
due to randomness in frequencies themselve, so the variability coming from sampling is 
dominated. 

Consider a renewal process with distribution for spacings like that of — log(l — W). If 
the moments are finite, m < oo, cr^ < oo, then a standard result from the renewal theory 
implies that the number of renewals on [0, logn] is approximately normal for large n, with 
the expected value asymptotic to (logn)/m. The same is valid for M^, and under the 
additional assumption mi < oo also for Kn (see [I7|). Under weaker assumptions on the 
moments, the possible asymptotics correspond to other limit theorems of renewal theory, 
as shown in j20j : 

Theorem 6.1. Suppose the distribution o/ — log(l — W) is nonlattice with mi < oo. 
The following assertions are equivalent. 

(i) There exist constants a„, 6„ with a„ > and 6„ G M such that, as n ^ oo, the vari- 
able {Kn — hn)/a,n convcrgcs weakly to some non- degenerate and proper distribution. 

(ii) The distribution v of — log(l — W) either belongs to the domain of attraction of a 
stable law, or the function i)[x, oo] slowly varies as x ^ oo. 

Furthermore, this limiting distribution of {Kn — bn)/an is as follows. 
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(a) //o"^ < oo, then for bn = m ^\ogn and an = (m ^a^logra)^/^ the limiting distribution 
is standard normal. 

(b) //cr^ = oo and 

1 

(logy)^P(l - G dy) ~ £(-logx) asx^O, 

/or some i slowly varying at oo, then for bn = m~^logn, a„ = m~^/^CLiognj and Cn 
any sequence satisfying \imn-,oon£{cn)/Cn = 1, the limiting distribution is standard 
normal. 

(c) Assume that the relation 

P(l - < x) ~ (-logx)"^£(-logx) asx^O, (39) 

holds with i slowly varying at oo and 7 G [1,2), and assume that m < 00 if 
7 = 1, then for bn = ni~Mogn, a„ = m~ ^^/'^cuognj and Cn any sequence satis- 
fying MiRn-, 00 ni{cn) / c1 = 1, the limiting distribution is j-stable with characteristic 
function 

T I— > exp{ — |r|T(l — 7)(cos(7r7/2) + i sin(7r7/2) sgn(r))}, r G M. 



(d) Assume that m = oo and the relation fl39|) holds with 7 = 1. Let c be any positive 
function satisfying lim^^oo xi{c{x)) / c{x) = 1 and set 

ip{x):=xf F{l-W <y)/ydy. 

J exp(— c(x)) 

Let b be any positive function satisfying b{ip{x)) ~ il){b{x)) ~ x (asymptotic inverse 
to Then, with bn = b{\ogn) and an = 6(logn)c(6(logn))/logn, the limiting 
distribution is 1-stable with characteristic function 

r ^^ exp{-|r|(7r/2 - ilog |r| sgn(r))}, r G M. (40) 



(e) // the relation fl39|) holds with 7 G [0, 1) then, for 6„ = and an := log^ n/i{\ogn) , 
the limiting distribution is the scaled Mittag-Leffter law 6^ ( exponential, if j = 0) 
characterised by the moments 

/ x'^eJdx) = — — , n G N. 

Jo ^ r«(l-7)r(l + n7) 

Sketch of proof The results are first derived for M„ by adopting asymptotics from the 
renewal theory. To pass to Kn it is shown, under the condition mi < 00, that the variable 
Mn — Kn (the number of empty boxes to the left of f/„:„) converges in distribution and 
in the mean to a random variable with expected value mi/m. 

□ 
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Example Suppose W has a beta density fll2p . The moments are easily computable as 
m = ^(^ + 7) -^(^), mi = ^(^ + 7) - ^(7), = ^'(e) - ^'(^ + 7) (with ^ = ryr 
denoting the logarithmic derivative of the gamma function). We are therefore in the 
case (a) of Theorem 16 -H hence Kn is asymptotically normal with E[i^„] ~ m"^ logn and 
Var[i^^„] of the same order. 

The instance 7 = 1 recovers well-known asymptotics of Ewens' partitions, which have 
Kn ~ ^logn a.s. In this case the limit law of the number of empty gaps Mn — Kn has 
probability generating function z 1— r(^ + l)r(6' + l — z6) /r{l + 26 — 6z) (which identifies 
a mixture of Poisson distributions, see |21j). 

Example Suppose the law of W is given by P(l — W<x) = {l~ loga;)^^, x G (0, 1). It 
can be checked that mi < cxd, hence the case (c) applies and 

(loglogn)2 1 1 1 

Kn — log log n — log log log n 

logn 

converges to a 1-stable law with characteristic function (HOl) . The number of empty boxes 
Mn — Kn converges in probability to 0. 

Under assumptions m < 00, mi < 00 the limit behaviour of ii'n.r's is read from a limiting 
occupancy model [22]. To describe the limit we pass to the dual composition, generated 

by right-to-left stick-breaking IZ = {Vi ■ ■ - Vi : i > 1} with independent 1 — Vi = W . Let 
[Xn^i, Xn,2, • • •) be the occupancy numbers of the gaps in the left-to-right order, this is a 
random weak composition (O's allowed) of n with Xn^i > and Xnj > for j > 1. By 
inflating [0, 1] with factor n, the uniform sample converges as a point process to a unit 
Poisson process (balls). On the other hand, nTZ converges to a self-similar point process 
Z, whose gaps play the role of boxes. From this, the occupancy vector (X„ i,X„_25 • • •) 
acquires a limit, which is an occupancy vector (Xi, X2, . . .) derived from the limiting point 
processes. The limit distribution of the occupancy vector is 

k 

P(Xi = Ai, . . . , Xfc = Afc) = ^ ^, , TT g(A. : A,) 

m(Ai + . . . + Afcj 

where Ai > 0, A^ > 0, A, = Ai + . . . + A^ and q{n : m) = {'^)E[W"'{1 - IV)""'"]. Cor- 
respondingly, Kn^s jointly converge in distribution to #{z : Xj = r}, r = 1, 2, . . .. The 
convergence also holds for Knfl, defined as the number of g4 

li W = beta(l,0) then Z is Poisson process with density 6/x. Then Kn^s converge 
in distribution to independent Poisson variables with mean 6/r, which is a well known 
property of Ewens' partitions [2]. It is a challenging open problem to identify the limit 
laws of the Kn^s for general distribution of W. 

6.2 Regular variation: < a < 1. 

Suppose fl37j) holds with < a < 1. This case is treated by reducing the occupancy 
problem to counting the gaps of given sizes. For x > let Nx[t) be the number of gaps of 
size at least x, in the partial range of the multiplicative subordinator (l — exp(— 5^), < 
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u < tj. Introduce the exponential functionals 

Ia{t) := / exp{-aSt)dt, 4 := /^(cx)). 
Jo 

The distribution of /^(oo) is determined by the formula for the moments [8] 

nli*(«j) 

where $ is the Laplace exponent of the subordinator {St). 

Theorem 6.2. [30J Suppose the Levy measure fulfills ([37]). Then f or < t < oo 
for a = 1 „ ^ Ia{t) a.s., x j 0, 



where 



is another function of slow variation, satisfying ii{z) ^ i{z) as z ^ oo. 

Sketch of proof Let Nx{t) be the number of gaps in the range of the (additive) subordi- 
nator restricted to [0,t]. By the Levy-Ito construction of (St) from a Poisson process, we 
have the strong law N^it) ~ i>[y, oo]t a.s. for y I 0. A small gap (s, s + x) is mapped by 
the function s ^ 1 — e~'^ in a gap of size e~^x, from which the result for finite t follows by 
integration. Special tail estimates are required to conclude that similar asymptotics hold 
with integration extended to [0, oo]. □ 

The instance a = 1 may be called in this context the case of rapid variation. In this case 
i in (137|) must decay at oo sufficiently fast, in order to satisfy $(1) < oo. 

Conditioning on the frequencies S = {sj) embeds the small-block problem in the 
framework of the classical occupancy problem: n balls are thrown in an infinite series of 
boxes, with positive probability Sj of hitting box j. By a result of Karlin [35], the number 
of occupied boxes is asymptotic to the expected number, from which Kn ~ lE[-ft"n I T^] 
a.s., and a similar result holds for Kn,r under the regular variation with index a > 0. 
Combining this with Theorem 16. 2[ we have (see [30] ) 

Theorem 6.3. Suppose the Levy measure fulfills (jSZD- Then, uniformly inO < t < oo, 
as n oo, the convergence holds almost surely and in the mean: 

r{l -a)n-i{n) ^' r(l - a)n"%) ^ ' J "^^ '^ 

for < a < 1 and r > 1, or a = 1 and r > 1. Similarly, ICn{t) / {nii{n)) h{t) for 
a = 1. 
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Thus Kn, Kn^r have the same order of growth if < a < 1. In the case a = 1 of rapid 
variation, singletons dominate, Kn,i ~ Kn, while all other Kn,r^ with r > 1 are of the 
same order of growth which is smaller than that of Kn- 

Example The subordinator associated with the two-parameter family of compositions 
has $ given by (126|) . hence 

, ^ {a + e){2a + e)...{{k-l)a + e)V{e + l) 
^ V{ka + e){aV{l-a)Y 

which for ^ = and < a < 1 identifies the law of la as a Mittag-Leffler distribution. 
Theorem 16.31 recovers in this instance known asymptotics [l3] related to the local times 
of Bessel bridges. 

One generalisation of Theorems 16.21 and 16.31 is obtained by taking for paintbox the 
range of a process (f){St), where : M_|_ — [0, 1] is a smooth monotone function, with not 
too bad behaviour at oo. The generalised power laws hold also for this class of partitions, 
with the only difference that the exponential functionals should be replaced by integrals 
see |30]. 



6.3 Slow variation: a = 

The case of infinite Levy measure with slowly varying tail i)[y, oo] ~ ^(l/l/) {y i 0) 
is intermediate between finite u and the case of proper regular variation. In this case 
Kn,r oo (like in the case a > 0) but Kn,r -C Kn (like in the case of finite 9). Following 
Barbour and Gnedin [3] we will exhibit a further wealth of possible modes of asymptotic 
behaviour appearing in this transitional regime. 

We assume that the first two moments of the subordinator are finite. The assumption 
about the moments is analogous to the instance (a) of Theorem 16.11 in the case of finite 
D. The results will be formulated for the case 

E[St] = t, Var[5i] = s\ 

which can be always achieved by a linear time scaling. Indeed, a general subordinator St 
with 

poo POO 

m:=E[Si]= / xi>{dx), := Var[Si] = / x'^i>{dx) 
Jo Jo 

should be replaced by St/ui, then = t>^/m. Because the linear time change does not 
affect the range of the process, it does not change the distribution of Kn, Kn,r- 

For the sample (balls) we take a Poisson point process on [0, 1] with intensity n > 0. 
This is the same as assuming a Poisson(n) number of uniform points thrown on [0,1]. 
To avoid new notations, we further understand n as the intensity parameter, and use the 
old notation /C„(t) to denote the number of blocks of the (poissonised) subcomposition 
on the interval [0, 1 — exp(— 5*^)]. The convention for Kn is the same. For large samples 
the poissonised quantities are very close to their fixed-n counterparts, but the Poisson 
framework is easier to work with. 

The total number of blocks is the terminal value Kn = /Cn(oo) of the increasing 
process }Cn{t). Poissonisation makes the subcompositions within [0, 1 — exp(— S'j)] and 
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[1 — exp(— ^t), 1] conditionally independent given St, hence /C„(f:) and /C„(cx3) — /C„(t) 
are also conditionally independent. The consideration can be restricted to the time range 
t < Tn where t„ := inf{t : St > logn} is the passage time through logn, since after this 
time the number of blocks produced is bounded by a Poisson(l) variable. 
Define the poissonised Laplace exponent 



POD 

$o(n) := / {1 - exp(-n(l - e-y))}u{dy). 
Jo 



For large n, we have $o(^) ^ ^{fi), but the former is more convenient to deal with, since 
it enters naturally the compensator of (/C„(t),t > 0), 



^nit) := / $o(riexp(-5„))dw. 
Jo 



Introduce 



/■oo ds 
Mn):^ {Mne-^Vds ^ {Ms)}' - : A; = 1,2. 
Jo Jo s 

By the assumption of slow variation and from $o(^) — > oo it follows that $fe's are also 
slowly varying, and satisfy 

These functions give, asymptotically, the moments of and of the terminal value of the 
compensator 

E[Kn] = EK(oo)] - $i(n), Va.r[Kn] - VarK((X))] - s^M^), n ^ oo. 

Remcirk In the stick-breaking case t>[0, oo] — 1 the asymptotics of Var[A„(oo)] and 
Var[ir„] are different, because the asymptotic relation $i(n) -C ^2i'n) is not valid. In- 
stead, we have Var[74„(oo)] ~ v'^m~^ logn, and Var[ir„] ~ cr^m^ logn with a'^ — v'^ — m^. 

The following approximation lemma reduces the study of JCnit) to the asymptotics of 
the compensator. 

Lemma 6.4. We have, as n ^ oo, 

E[ir„-A„(oo)]2~$i(n), 
and for any hn such that $i(n)/6^ 



lim P 



sup \K,n{t) - ^n(^)| > K 
0<t<oo 



0. 



Sketch of proof Noting that /C„(t) — A^it) is a square integrable martingale with unit 
jumps, we derive EfX^ — A„(oo)]^ = E[A„(oo)], from which the first claim follows. The 
second follows by application of Kolmogorov's inequality. □ 

From this the law of large number is derived: 
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Theorem 6.5. As n ^ oo, we have Kn ~ v4„(oo) ~ $i(^) almost surely and in the 
mean. 

For more delicate results we need to keep fluctuations of the compensator under control 
For this purpose we adopt one further assumption, akin to de Haan's second order regular 
variation [9]. As in Karamata's representation of slowly varying functions [9], write $o as 



$o(s) = $o(l)exp 

where 



zL{z) 



Lin) *»<") 



?7,<l>o(n) 



The key assumption. There exist constants Cc^io > such that 



nL'(n) 



Lin) 



< for n > no- (41) 

logn 



In particular, L is itself slowly varying, which is equivalent to the slow variation of n^'^in) 
as n — s> oo. Note that the faster L, the slower $o- The assumption allows to limit 
local variations of $0; which makes possible approximating the compensator by a simpler 
process 

.Mlogn / Su-U 



in which the subordinator enters linearly. This in turn allows to derive the limit behaviour 
of the compensator from the functional CLT for {St) itself. 

6.3.1 Moderate growth case 

This is the case L{n) x logn. We shall state weak convergence in the space Z)o(K+) of 
cadlag functions with flnite limits at oo. 
The time-changed scaled process 

K,n{u\ogn)-\ogn J %in'-")d 

converges weakly to the process 

Yj^'\u) := s / h^:\v)B,dv, 







where (Bu) is the BM and 



£o(n^2l]^ 
" ^ ^ ■ %{n)L{n^-^) ■ 

In particular, if L{n) ~ 7 logn for some 7 > 0, we have 

puAl 

Y^'\u) = s / 7-^(1 - t;)(i-^)/^£,dw. 
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Example Consider a subordinator with Laplace exponent $(?t,) ~ clog^'^'^n, m = E,[Si] = 
$'(0), = Var[S'i] = $"(0). A CLT for Kn holds with standard scaling and centering 
by the moments 

clogW/7^ c2^;2logl+2/7^ 

m(l + 1/7) ni'^(l + 2/7) 

A special case is the gamma subordinator with 

u{dy) = ee-^My/y, ^n) = e\og{l + n/e), m = 1, v"^ = 1/6. 
Some generalisations are considered in [29j. 

6.3.2 Fast growth case 

This case is defined by the conditions L{n) — > 00, L{n) <^ logn, then $0 grows faster 
than any power of the logarithm. For instance $o(^) ^ exp(log'^ n) with < 7 < 1. The 
scaled process 

/ puA{logn/L{n)) 

/Cf (^) := V^H)"' I }Cn{uL{n)) - L{n) / ^o{nexp{-vL{n))dv 

converges weakly to 







Y^^\u) := s / e-^'B.dv. 
Jo 



6.3.3 Slow growth case 

Suppose that L{n) = c{n) logn, where c{n) —>■ 00 but slowly enough to have 



dn 

00 



2 c{n)n\ogn 

(otherwise z> is a finite measure). For instance we can have c(n) x log logn (in which case 
$(«) X log logn), but the growth c(n) x log''''n. with 7 > is excluded. Like in the case 
of finite 9, almost all variability of Kn comes from the range of times very close to the 
passage time r„. 

The key quantity describing the process ICn{t) in this case is the family of integrals 

[ " Me^dv, t > 0, 

where r„ is the passage time at level logn. The randomness enters here only through r„, 
which is approximately normal for large n with E[x„] ~ logn, Var[r„] ~ s^logn,. The 
process 

(/•logn N 
/C„(t) - / Me^dv 
J (logn-t)_| 

is approximated by 

"(log n-t+sri^logn)+ 



/•(_logn.-t+s»7VlognJ+ 

=; ST] - ($o(n)v/logn)-i / $o(e^)d^;, 

J(logn-t)+ 

where 77 is a standard normal random variable. 
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6.3.4 Gamma subordinators and the like 

Asymptotics of Kn^r is known |29] for gamma-like subordinators with logarithmic be- 
haviour z/[?/, oo] ~ — clogy for ?/ — > 0, under some additional assumptions on the tail of 
z> for y near and oo. This case is well suited for application of singularities analysis to 
formulas like 

" n^- Vr(n)dn = £i!^L^ - , - 1< 3fJs < 

for the Mellin transform of the expected value iprin) of poissonised -ft'n.r- In this formula 
and $(— s : —s) are the analytical continuations in the complex domain of the 
Laplace exponent $(n) and the bivariate funtion $(n : n), respectively. 
For the moments we have 

H^n] ~ — , Var[is:„] , 

/m om 

Iqct ri / ip' 1 \ 

E[ir„,,] ~ Var[ir„,,]~ — - + — logn. 

rm \r^m'^ rm / 

The CLT for Kn is an instance of the moderate growth case in Section 16.3. 1[ 
As n — >^ oo, the infinite sequence of scaled and centered block counts 

,r — i, . 

Vlogra 

converges in distribution to a multivariate Gaussian sequence with the covariance matrix 

1 1 
— - + l(z = j)— , 1,3 = 1,2,.... 

See |29] for explicit assumptions on z> in this logarithmic case and further examples, 
including computations for the subordinator in Example 13.51 

The behaviour of Kn^s for other slowly varying infinite z> remains an open problem. 
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